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1.0 INTRODUCTION 


The Primary objective of this analysis is to examine the Book data set for distribution, outlier 
and anomalies to direct testing of specific hypothesis. Also aims in understanding the dataset visually 
through graphical representation. Scope of our project is to get a fair idea about books, Also, choosing 
the right book at right time to inculcate the knowledge from books from this analysis. Using a 
combination of proprietary algorithms and billions of data sets, Goodreads' recommendation engine 
can effectively predict what books readers will be interested in next (Chung, 2011). The algorithm 
checks to see how frequent they appear on the same bookrack, and they are enjoyed by the same 
population. The engines also differentiate the people of who are interested in the similar kind of 
books with others who have different taste. 
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In Markwick (2019), it is a website which recommends and gives rating about books. After 
reading the book the user can give rating and can see what others thought about the book. It also 
gives the crowd ranking about the book. There are two ratings one is my rating and other one is 
average rating the plotting against these two points gives the correlation between the two. A positive 
correlation indicates that my rating accepts with the crowd rating if not then the negative correlation. 
It also provides the rating of the book by considering the page length too. 


As many people feel bored and uneasy while reading some books: which they didn’t like or the 
books they didn’t need at that time. This is most common things, and this may reduce their book 
addiction. Also, this habit of reading books develops one’s personality, knowledge, decision making, etc. 


So, good book data analysis helps people to analyse, customize and pick the right books at 
right time. This helps people in all age groups (irrespective of age). 


2.0 PROPOSED SYSTEM AND RESULTS 
2.1 Exploratory Data Analysis 


An essential part of any Data Analysis or Data Science project is exploratory data analysis 
(EDA). Discovering patterns, anomalies, and hypotheses (outliers) in our knowledge of the dataset is 
the goal of EDA. 


Using EDA, the dataset's numerical data may be summarised statistically, and different graphs 
can be generated to aid in data interpretation. 


Steps followed in EDA 
1) Importing libraries 
2) Reading data 
3) Descriptive statistics 
4) Missing value imputation 


5) Graphical Representation 


2.2 Dataset Description 
Our group analysed a Goodreads Dataset, collected by the Goodreads API. This data set 
contains detailed information about individual books. This dataset was obtained via the Kaggle site, 
from user Soumik. There are 10,352 unique titles and 11,126 unique ISBNs with a variety of attributes 
and data types. The twelve columns contain strings, integers, date Time, and characters (Ralli, 2020; 
Jingchenliu, 2020). 
Information contained in the dataset includes: 
e Title: The title released by the publisher. 
e Authors: Identifying information about the book's writers. 
e Average rating: The overall average rating for the book was obtained. 
e ISBN: The International Standard Book Number (ISBN) is also a unique number that 
may be used to identify a specific book. 
e JISBN-13: Instead of the conventional 11-digit ISBN, the book will have a 13-digit ISBN 
to identify it. 


e Language code: The major language used to explain the content of the book is 
described below. For instance, Eng., indicates that the book was written in English. 
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Number of pages: Its total number of pages in the book. 
Ratings count: The total number of times the book has been rated by readers. 
Text reviews count: The book got a lot of good reviews in writing. 


Publication date: The first publication date of the book. 


The various data analysis and the implemented results are shown as screen shots. The pair 
plot and Normalization are shown in the Figure 1 and Figure 2. The correlation matrix and bar plot 
are shown in Figure 3 and Figure 4. 
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Figure 1 - Pairplot 


normalized=s ._normalize (x) 
plt.plot (normalized) 


[<matplotlib.lines.Line2D at 0x1822b7a9250>] 
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Figure 2 - Normalization 
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In [46]: from scipy import stats 
from mlxtend-preprocessing import minmax_scaling 


In [47]: fig, axes = plt.subplots(figsize=(10, 5)) 
sb.heatmap(data=df.corr(), annot=True, linewidths=1.5, fmt=".1f", ax=axes) 


plt.show() 
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Figure 3 - Correlation Matrix and Heatmap 


In [55]: # visualise the above comparison result 
pred.plot(kind="bar', figsize=(13, 7)) 


Out [55]: <matplotlib.axes. subplots.AxesSubplot at 0x1822a5689d0> 
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Figure 4 - Bar Plot 


3.0 CONCLUSION 


The scale, scope, complexity, and tempo of good books are likely to increase in the upcoming 
years. People would not consume a lot more time in choosing the books based on their need with our 
good books. It will be people’s favourite choice. 
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